AITopics | gradient calculation

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Neural Information Processing SystemsJun-12-2026, 03:55:42 GMT

The recently proposed optimization algorithm for deep neural networks Sharpness Aware Minimization (SAM) suggests perturbing parameters before gradient calculation by a gradient ascent step to guide the optimization into parameter space regions of flat loss. While significant generalization improvements and thus reduction of overfitting could be demonstrated, the computational costs are doubled due to the additionally needed gradient calculation, making SAM unfeasible in case of limited computationally capacities. Motivated by Nesterov Accelerated Gradient (NAG) we propose Momentum-SAM (MSAM), which perturbs parameters in the direction of the accumulated momentum vector to achieve low sharpness without significant computational overhead or memory demands over SGD or Adam. We evaluate MSAM in detail and reveal insights on separable mechanisms of NAG, SAM and MSAM regarding training optimization and generalization.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback

Gradient Descent for Spiking Neural Networks

Dongsung Huh, Terrence J. Sejnowski

Neural Information Processing SystemsFeb-12-2026, 08:23:20 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, neuron, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > La Jolla (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ee76626ee11ada502d5dbf1fb5aae4d2-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 00:38:22 GMT

gradient calculation, param, top-kast, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.41)

Add feedback

890e018ca9c879c5ac01757239538f7c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 13:44:07 GMT

calculation, gradient, sbp, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Gradient Descent for Spiking Neural Networks

Dongsung Huh, Terrence J. Sejnowski

Neural Information Processing SystemsNov-20-2025, 14:46:29 GMT

Research in spike-based computation has been impeded by the lack of efficient supervised learning algorithm for spiking neural networks.

artificial intelligence, machine learning, neuron, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > La Jolla (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

778ff1fcfb6d6707fc015908a1845b62-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 22:39:53 GMT

artificial intelligence, machine learning, np 2, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Saudi Arabia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

890e018ca9c879c5ac01757239538f7c-Paper-Conference.pdf

Neural Information Processing SystemsAug-16-2025, 17:33:04 GMT

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Reviews: Neural Ordinary Differential Equations

Neural Information Processing SystemsOct-7-2024, 12:03:07 GMT

Given the ever increasing importance of AD in both communities, adding to the range of scientific computing primitives for which frameworks such as autograd can efficiently compute derivatives through will hopefully spur more widespread use of gradient based learning and inference methods with ODE models and hopefully spur other frameworks with AD capability in the community such as Stan, TensorFlow and Pytorch to implement adjoint sensitivity methods. The specific suggested applications of the'ODE solver modelling primitive' in ODE-Nets, CNFs and L-ODEs are all interesting demonstrations of some of the computational and modelling advantages that come from using a continuous-time ODE mode; formulation, with in particular the memory savings possible by avoiding the need to compute all intermediate states by recomputing trajectories backwards through time being a possible major gain given that device memory is often currently a bottleneck. While'reversing' the integration to recompute the reverse trajectory is an appealing idea, it would have helped to have more discussion of when this would be expected to breakdown - for example it seems likely that highly chaotic dynamical systems would tend to be problematic as even small errors in the initial backwards steps could soon lead to very large divergences in the reversed trajectories compared to the forward ones. It seems like a useful sanity check in an implementation would be to compare the final state of the reversed trajectory to the initial state of the forward trajectory to check how closely they agree. The submission is generally very well written and presented with a clear expository style, with useful illustrative examples given in the experiments to support the claims made and well thought out figures which help to give visual intuitions about the methods and results.

experiment, neural ordinary differential equation, trajectory, (10 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization

Fox, Derek, Hernandez, Samuel, Tong, Qianqian

arXiv.org Machine LearningJul-23-2024

Stochastic optimization algorithms are widely used for large-scale data analysis due to their low per-iteration costs, but they often suffer from slow asymptotic convergence caused by inherent variance. Variance-reduced techniques have been therefore used to address this issue in structured sparse models utilizing sparsity-inducing norms or $\ell_0$-norms. However, these techniques are not directly applicable to complex (non-convex) graph sparsity models, which are essential in applications like disease outbreak monitoring and social network analysis. In this paper, we introduce two stochastic variance-reduced gradient-based methods to solve graph sparsity optimization: GraphSVRG-IHT and GraphSCSG-IHT. We provide a general framework for theoretical analysis, demonstrating that our methods enjoy a linear convergence speed. Extensive experiments validate

algorithm, gradient, raph scsg-iht, (12 more...)

arXiv.org Machine Learning

2407.16968

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > North Carolina (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

Tyurin, Alexander, Richtárik, Peter

arXiv.org Artificial IntelligenceJan-3-2024

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.

communication round, np 2, step size, (13 more...)

arXiv.org Artificial Intelligence

2205.1558

Country: Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Filters

Collaborating Authors

gradient calculation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

Gradient Descent for Spiking Neural Networks

ee76626ee11ada502d5dbf1fb5aae4d2-Supplemental.pdf

890e018ca9c879c5ac01757239538f7c-Paper-Conference.pdf

Gradient Descent for Spiking Neural Networks

778ff1fcfb6d6707fc015908a1845b62-Paper-Conference.pdf

890e018ca9c879c5ac01757239538f7c-Paper-Conference.pdf

Reviews: Neural Ordinary Differential Equations

Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting